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■ Stein's metliod is used to obtain two theorems on multivariate normal approximation. Our 

main theorem, Theorem 11.21 provides a bound on the distance to normality for any nonnegative 
random vector. Theorem II .21 requires multivariate size bias coupling, which we discuss in studying 
the approximation of distributions of sums of dependent random vectors. In the univariate case, 
, we briefly illustrate this approach for certain sums of nonlinear functions of multivariate normal 

QQ I variables. As a second illustration, we show that the multivariate distribution counting the number 

lO ' of vertices with given degrees in certain random graphs is asymptotically multivariate normal and 

. obtain a bound on the rate of convergence. Both examples demonstrate that this approach may be 

ly-^ I suitable for situations involving non-local dependence. We also present Theorem II .41 for sums of 

' vectors having a local type of dependence. We apply this theorem to obtain a multivariate normal 

r"| ■ approximation for the distribution of the random p-vector which counts the number of edges in 

"j^ I a fixed graph both of whose vertices have the same given color when each vertex is colored by 

one of p colors independently. All normal approximation results presented here do not require an 
ordering of the summands related to the dependence structure. This is in contrast to hypotheses 
of classical central limit theorems and examples, which involve e.g., martingale, Markov chain, or 
^ ' various mixing assumptions. 
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1 Introduction 



Stein's method has been successful in assessing the quahty of normal and Poisson approximations 
under various dependence structures. See Stein (1972), Stein (1986), Barbour, Hoist and Janson 
(1992), and references therein. Significant multivariate (or functional) versions of Stein's method 
appear for example in Barbour (1990), and Gotze (1991). An important part of Stein's method is 
the construction of auxiliary random variables (coupling) , which are used in the computation of the 
bounds on the distance between a given random variable and its normal or Poisson approximant. 

The coupling variables constructed in the application of Stein's method may not appear explic- 
itly in the final bounds. Applications of results where the auxiliary random variables do appear in 
the bounds require their explicit construction. Although this feature may make such results more 
difficult to use, couplings yielding useful bounds may often be found where other methods seem to 
fail. For the Poisson, theorems of this nature may be found in Barbour, Hoist and Janson (1992) 
and references within. In the main result of this paper, Theorem II. 21 we obtain bounds for normal 
approximations in terms of such couplings, and provide general guidelines and methods for their 
construction so that these methods may be applied. The couplings studied here, an instance of a 
construction of a joint distribution with given marginals, is of independent interest. It is known that 
constructions of multivariate couplings may be problematic, see, e.g., Dall'Aglio, Kotz and Salinetti 
(1991). Nevertheless we are able to provide methods for the required multidimensional coupling 
constructions, which we illustrate in two applications involving nonlocal dependence. Theorems 14.11 
and ma 

By the same techniques used to prove Theorem II. 2| Stein's method and the analysis of the 
properties of the solution to the partial differential equation (|13|1 , we obtain Theorem 11.41 a result 
complementary to Theorem ll.2l Theorem II . 41 provides a multivariate normal approximation under 
conditions of local dependence. Unlike Theorem ll.2l coupling variables do not appear explicitly in 
Theorem 11.41 

In order to introduce the couplings needed for the proof and applications of Theorem II. 2| we 
require the following definition. 

Definition 1.1 Given a nonnegative random variable W with distribution dF{w) and mean A, W* 
is said to have the W-size biased distribution if it has distribution wdF{w) / \. 

Note that the distribution of W* may be characterized by the relation 



for all functions G for which the expectations exist. Size biased distributions are well known in 
sampling theory and renewal theory, for example. The following one dimensional version of our 
main result illustrates the relevance of size biased coupling to normal approximations. 

Theorem 1.1 Let W be a nonnegative random variable with mean EW = A, variance cr^ = 
Var{W), and let W* be jointly distributed with W , having the W-size biased distribution. Then for 
any piecewise continuously differentiable h, 



where \ \ ■ \ \ denotes the supremum norm, and = Eh{Z) with Z a standard normal variate. 

Theorem 1 1.1 1 is an extension of a result of Baldi, Rinott and Stein (1989). The theorem requires 
the construction of W* on a joint space with W; hence, obtaining good bounds in any particular 
application depends on the construction of a W* which will be close to W in an appropriate sense. 



EWG{W) = XEG{W*) 



(1) 



\Eh{ 




a 



< 2\\h\\ — JVarE{W* - W\W) + \\h'\\^E{W* - Wf 



(2) 
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However, since the resulting bound is vahd for any construction for which the marginal distribution 
of W* coincides with the W^-size biased distribution, one has the flexibility to choose constructions 
which result in good computable bounds. 

Here is a brief description of a method which leads to a construction of a W-size biased variate 
W* when W = Xi + • • • + X„ is a sum of random variables. To begin, if Xi, . . . ,Xn are iid 
nonnegative random variables with finite mean, then W* can be constructed by replacing any 
single summand, say, Xi by an independent variable X^ with the Xi-size biased distribution, i.e. 
W* = XI + X2 + ■ ■ ■ + Xn- More generally, if is a sum of non-iid variates, then a like construction 
of W* may be given by replacing Xj by Xj, where the random index is chosen independently with 
P{I = i) = EXi/ ^ EXj, and adjusting the remaining variables to their conditional distribution 
given the new value of Xj. 

A special case of this idea is Midzuno's procedure (e.g. Cochran (1977)), where a size biased 
variable is used to obtain unbiased ratio estimators in finite population sampling. To describe 
Midzuno's procedure, let nonnegative "sizes" Xi, . . . , Xn be obtained by sampling from a finite 
population without replacement, and W be their sum. Then, W* is realized by sampling the first 
variate in proportion to its size, removing it from the population, and sampling the other vari- 
ables without replacement from the population that remains, that is, sampling from the resulting 
conditional distribution. 

Further flexibility is obtained by realizing that for any representation of W in the form 

w = MUi) + --- + MUn), (3) 

the above construction of W* may be accomplished by choosing a random index I such that 
P{I = i) = Eipi{Ui) / Y^^=i Eipj{Uj), and ii I = i, replacing Ui by an independent variable with 
distribution ipi{u)P{Ui G du)/E'ipi{Ui), and adjusting the remaining U variables. Therefore, the 
theorem may be applied whenever one can find a transformation such that the variables Ui, . . . ,Un 
have a dependence structure that allows the computation of the conditional distribution required 
in the bounds. Further details and examples of these size bias coupling constructions and their 
applications will be provided in Sections [21 and 01 

For the multivariate case, we need a more general notion of size biasing, and we replace the * 
notation by a superscript (3 in order to identify in which "coordinate" or variable the variates are 
size biased. 

Definition 1.2 Let I an arbitrary index set and let X = {Xa : a €z 1} be a collection of nonnega- 
tive random variables with joint distribution dF(x) and means EX^ = Xa- For /? G X, we say that 
= {X^ : a el} has the X -size biased distribution in the coordinate if X'^ has the joint 
distribution xpdF{;x) / \p. 

The distribution of X^ is characterized by the relations 

EX(,GOq = \pEG{^^) (4) 

for all functions G for which the above expectations exist. When the function G depends on X 
only through Xp, equation Q yields EXisG{Xfs) = \pEG{X^), hence, comparing to (pQ), we see 

that the (3^^ coordinate of X^, that is, the variate X^, has the X^-size biased distribution in the 
sense of definition ll.il 

By considering the case where the collection X consists of only a single random variable, we 
see that equation (jl} reduces to (0); hence definition 11.11 is a special case of defintion 11.21 

We will apply Definition 11.21 to a vector W = (Wi, . . . , Wp) G BP by identifying it with the 
collection {Wj : j e 1} with 1 = {1, . . . ,p}. Letting EWi = Xi, we see that the vector W* = 
{Wf, . . . , Wp) is characterized by 

EWiG(W) = XiEG{W'). (5) 
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Relation Q leads to a multivariate normal approximation theorem, for which we introduce 
the following notation (see e.g., Horn and Johnson (1985)). Given a vector a in RP, let ||a|| = 
maxi<j<p \ai\. Given a. p x p matrix A = (aij) we set = maxi<j .j<p \aij \ , and more generally 
for any array, || ■ || will denote its maximal absolute value. For an array of functions, say ^(w) = 
{aj(w)}, where i could stand for a multiple index, \\A\\ = sup^ maxj |aj(w)|. For a smooth function 
h : BP ^ Rwe let V/i or Dh denote the vector of first partial derivatives of h, D^h the usual Hessian 
matrix of second order partial derivatives and D^h the A:*^ derivative of h in general. 

Our main multivariate result is the following theorem: 

Theorem 1.2 Let W he a random vector in with nonnegative components. Set A = (Ai, . . . , Xp) - 
EW, and assume VarW = T, = (cTjj) is invertihle. For each i = 1, . . . ,p let (W, W*) be random 
vectors defined on a joint probability space with W* having the W- size biased distribution in the 
z*^ coordinate as in 0). Let h : R^ R be a function having bounded mixed partial derivatives up 
to order 3. Let = Eh(Z), where Z denotes a standard (mean zero, covariance L) normal vector 
in RP. Then 

\Eh{j:-'^/^{w - X)) - ^h\ < 

2 P P . 

y I \\^\\D^h\\J2Yl ^i^y^^rEi W] - Wj I W] 

1=1 j=i 

+^yl|S-^/'ini^'/^llEEEA,ii;l(T^;-M^,0(Wi-^^^ (6) 

i=l j=l k=l 

Note that the theorem does not require the joint construction of (W^,...,W^). Although 
Theorems 11.11 and 1 1 . 21 are stated for nonnegative variates, they may be applied to general variates 
by translation and truncation. 

In Section 121 we discuss the construction of the vectors W* required for Theorem 11.21 when the 
components of W are sums of dependent random variables. Specifically, when X = {Xo,,a G 1} is 
any collection of nonnegative random variables, and Ai, . . . , Ap are any subsets of I, we may apply 
Theorem 11.21 to the vector W = {Wi, . . . , Wp) where Wj = Y2aeAj -^a- In particular, we obtain 
a result for a sum W = {Wi, . . . ,Wp) = Ylu=i-^u of nonnegative dependent random vectors, 
Xu = {Xui, . . . , Xup), M = 1, . . . , n, by letting I he a, set of double indices and Aj = {1, . . . , n} x j. 

We briefly indicate how size biased variables arise in one dimensional normal approximations. 
Given a random variable W and a test function h, one can compute E[h{ ^~'^ ) — ^h] by computing 

E[f'{W) ~ ^^^^^ fO^)] where / is the bounded solution of the Stein equation 

f'H - ^^fH = hi^) - ^h. (7) 

If W* has the W-svze biased distribution, and therefore satisfies EW f{W) = EWEf{W*), we 
obtain 

Eh{^^^) -<fh = Eif'iW) - ^J^^fiW)] = E[f'{W) - ^ifiW*) - fiW))]. 

Taylor expansion of E[f{W*) — fiyV)] is then the first step in obtaining the bound in Theorem 
11.11 Note that the one dimensional versions of the theorems are not exactly special cases of their 
multivariate counterparts. In the multivariate case, equation ((T)) will be replaced by the partial 
differential equation H13|). resulting in different orders of the derivatives of h appearing in the one 
and multidimensional theorems. 

The size biased coupling approach handles cases where there is global dependence among the 
summand variables. In contrast the following univariate and multivariate results not based on size 
biased couplings are very useful in cases of local dependence. The following theorem is due to Stein 
(1986); our Theorem 11.41 is a multivariate version. 
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Theorem 1.3 Let X^, v = 1, . . . ,n, he random variables with EX^ = 0. Let S^, v = 1, . . . ,n be 
subsets of {1, ... ,n} and set W = J2v=i -^v and denote 

n 

EX.Xu = a\ 

assuming cr^ > 0. Then for any h : R —>■ R which is continuous and piecewise continuously 
differentiable, 



\Eh{W/a)-^h\ < ^"^ 



n ^ 2 

E <^ ^2 E {^vXu — EXyX-u) > 

v=l u€Sv ) 



1 " 

In typical applications of Theorem 11.31 we have X^j independent of {X^ : u ^ S^} and the second 
term of the bound in (jH)) vanishes. In this case we may view as a dependency neighborhood of 
Xy. Generally, the bound in (jS)) is small if these neighborhoods are small, so this theorem is useful 
when the dependence is local. 

The following result, which is particularly useful for normal approximations of sums of locally 
dependent random vectors, extends Theorem 11.31 to the multivariate case. 

Theorem 1.4 Let {X^, a E 1} be random variables with EX^ = 0. Let Ai, . . . , Ap be subsets of 
Z, and set W = {Wi, . . . , Wp) where Wj = ^^eA -^a- c^c/i a (z I let Sa Q Z, and assume 
that S = {(Jij) is symmetric positive definite, where 



cr.. 



Let h : RP ^ R be a function having bounded mixed partial derivatives up to order 3, and 
Eh{Z) where Z denotes a standard normal vector in R^. Then 



\Eh{^-'/^W)-^h\<^\\^-'/Y\\D^h\\ 

i=i j=i 



\ 



e{Y1 E (XaXp-EXo^Xp) 



+p||S-V2|| \\Dh\\ Y E E\E[X^\X(, : 

i=l a£Ai 

+\'j\\^-'^Y\\D^h\\±±±YE\X^ E ^/^ E ^.1- (9) 

1=1 j=l k=l a£Ai fSeAjHSa TeAfeflSa 

Note that a'^ of Theorem 11.31 and S of Theorem 11.41 are not necessarily equal to the covariance 
of W and the covariance matrix of W, respectively. In Theorem 11.41 the symmetry of Xl is 
guaranteed if the sets 5'q, C X are symmetric in the sense that /3 G 5q, if and only if a G 5^3. 
In particular, in applying Theorem 11.41 to a sum of mean zero random vectors, W = X]"=i-^m' 
where = (X^i, . . . , X^p) for u = l,...,n, it is natural to take neighborhoods of the form 
Sa = Si^u,i) = Tu X {1, ■ ■ ■ ,p}, where Tu are symmetric subsets of {1, . . . , n}. 
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In our applications, the sets Sa contain {/? € X : Cov(X^,Xq,) 7^ 0}; in any such case, S = 
Cov(W). In particular, S = Cov(W) if Xp is independent of Xq, for every (3 ^ Sa. In the general 
case, the above (somewhat unusual) choice of S simplifies the form of the bound. With the more 
natural choice S = Cov(W), the present technique applies to yield a version of the above theorem, 
but an additional term in the bound may result. 

Applications of the above theorems are given in Sectional As an illustration involving non-local 
dependence, we apply Theorem 11.11 and its multivariate extension, Theorem 11.21 to show that the 
multivariate distribution counting the number of vertices with given degrees in certain random 
graphs is asymptotically multivariate normal and obtain a bound on the rate of convergence. 
To illustrate a case of local dependence, we apply Theorem II. 3| and its multivariate extension. 
Theorem 11.41 to obtain a multivariate normal approximation for the distribution of the random 
p-vector which counts the number of edges in a fixed graph both of whose vertices have the same 
given color when each vertex is colored by one of p colors independently. Applications related to 
representations of as in will be given where the U variables are normal and multinomial. 
The ideas and results presented here have been applied in work of Luk (1994) in finite population 
sampling, and Reinert (1994) in the study of empirical measures. 

The proofs of Theorems ll.2| and ll.4l are given in Section|31 Theorems 1 1 . 1 1 and 1 1 . 1-?! mav be proved 
similarly. 

The theorems presented here supply approximations in terms of expectations of smooth test 
functions /i, allowing our main theorems to be presented in a form where they can be readily 
applied under unrestrictive, simple conditions. In the context of Stein's method, Stein (1986), 
Baldi, Rinott and Stein (1989), Gotze (1991), and Rinott (1994) among others, consider also non 
smooth functions, usually at the expense of added technical detail or some loss of information 
in the bounds. It is possible to obtain certain multivariate version of our results for nonsmooth 
functions h using the methodology developed in Gotze (1991), see Rinott and Rotar (1994). In the 
present paper, our main focus is in the coupling structure. The issue of smooth versus non-smooth 
function approximation is discussed in Barbour, Karohski and Rucihski (1989). 



The construction of size biased variables required for the application of Theorems 11.11 and 11.21 is 
the focus of this section. While the details depend on the case at hand, this section will provide 
general guidelines that extend and unify ideas which appeared in Baldi, Rinott, Stein (1989), and 
Stein (1992), where only univariate sums of zero-one variables were studied. 

The following lemma is the key in the construction of coupled variables satisfying equations 
and © required in Theorems 11.11 and 11.21 respectively. Readers interested only in the univariate 
case may read the lemma below with Z = A = B. 

Lemma 2.1 Let I he an arbitrary index set, and let X = {X^ '■ a (z 1} be a collection of 
nonnegative random variables. For any subset B C I, set Xb = YI/Si^b -^13' ^^'^ = EXb- 
Suppose B C 2 with Xb < 00, and for f3 £ B let have the "K-size biased distribution in 
coordinate (3 as in Definition I j.H Let X'^ he a random variable distributed as the mixture of the 
distributions X^, (3 G B with weights X^/Xb- Then 



2 Construction of size biased couplings 



EXbG{X) = XbEG{X.^). 



(10) 



Hence, for any A CI, if G is a function of Xa only, then 

EXbG{Xa) = XBEGiX^) 



(11) 



where 
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In particular, by taking A = B in hll\) we have EXaG{Xa) = A^G(X^), and hence has the 
X A- size biased distribution in the sense of defintion M.ll 

Proof: For a function G on X, we have EG{^I^) = EXf3G{X.)/Xp by equation ©. Multiplying 
by X/s/Xb and summing over (3 ^ B yields (|lfl|) . The remainder of the lemma now follows. □ 

Construction of X^: Since X^ is a mixture of the distributions X^ for P £ B with weights 
Xp/Xb, given the collection X = {Xa : a £ 1} first choose an independent index I € B according to 
the distribution P{I = [3) = Xp/ Xb- li I = P, construct X^ to have the X^-size biased distribution 
xpP{Xp G dxf3)/Xp. If Xp = X then the remaining variates Xa, a ^ j3 are constructed so that 
PCX.^ S dx) = P(X € dx I Xf3 = x). This construction yields 

P(X^ € dx) = P(X G dx [ = Xf^) xpP{Xp G dxp)/Xp, 

that is, X'^ ~ XjidF{'x)/Xji, and indeed X'^ has the X-size biased distribution in the Z?**^ coordinate 
as given in Definition 11.21 

In the univariate case, with W = X^^gj^o = Xj, and A = B = Z, equation Hll|) in Lemma 
12.11 shows that a construction of W* satisfying may be obtained by setting W* = Xj. Hence 
W* may be constructed as follows: a summand Xp of W , chosen with probability EXj^/EW, is 
replaced by a new value from its size biased distribution, and the remaining summands are adjusted 
to have the conditional distribution of X conditioned on the event that for the chosen P, Xp takes 
the new value. 

If the variates {Xa : a £ T} are independent the last step is not needed since by independence 
the conditioning is irrelevant. In this case, the construction of a VF-size biased variable W* reduces 
to size biasing a single randomly chosen summand Xj. In the case that {Xq, : a € 1} are all 
zero-one variates we simply have xj^ = 1, so in the case that is a sum of independent zero-one 
random variables, the coupling is accomplished by choosing an index I = P with probabilities 
proportional to P{Xp = 1), setting Xp = 1, and leaving the remaining variates unchanged. 

In the multivariate case, the connection between Lemma f2.1l and Theorem ll.2l for approximating 
sums of random variables is as follows. Given {Xa ■ a G 1} , let Ai, Ap he subsets of I, and 
set 

W = {Wi, ...,Wp), where Wj = X^, and = {Wl, . . . ,W^) where = 

When B = Ai and G(X) is a function depending on X only through W, equation (jlUj) yields 
equation ©. Therefore, one may obtain the vector W* satisfying © by constructing X"^' using a 
random index I G Ai with P{I = P) = X^/Xa, as described above. 

In particular, the sum of random vectors W = ^^^,=1 where X„ = {Xui, ■ ■ ■ ^Xup), corre- 
sponds to the choice X = {1, . . . , n} x {1, . . . ,p} and Aj = {{u,j) : ti = 1, . . . ,n}. In Sectional we 
apply this multivariate construction in the setup where X^j is the indicator of the event that the 
degree of the vertex u in the random graph K = Kn,n equals a prescribed number dj. Hence, Wj 
is the number of vertices of K of degree dj. The coupling in this case is accomplished as follows. 
Since EXui are equal for indices in Ai, to construct X"^* it is required to choose an index, say 
V, uniformly over {1, . . . ,n} and size bias Xyi for this v. As X^ is an indicator, size biasing is 
accomplished by replacing X^i by the constant 1. The above construction now requires that the 
remaining variables have their original distribution conditioned on X^ = 1. If X^i was initially 1, 
that is, if the degree of vertex v was di, no change is required. Otherwise, by adding or removing 
randomly chosen edges as appropriate, the degree of v is made to be di, thereby size biasing the 
indicator Xyi. This procedure results in a new graph K^, in which the other variables now have 
the proper conditional distribution. 

The following comments pertain to the random choice of index that appears in the above con- 
structions. In certain cases the size biased distribution can be constructed with a deterministic 
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index, however, such constructions may lead to larger bounds than those obtained using random- 
ization. 

We specialize to p = 1 and the expression Var£'[VF* — | M^] which appears in the first term 
in the bound in If Xi, . . . ,Xn are exchangeable then it is easy to see that W may be size 
biased by size biasing Xi; that is, in the above description one can set 1 = 1 deterministically and 
the rest is done as above. This is true for the example where W counts the number of vertices of 
degree d in the random graph K. Often in order to make calculations tractable it is necessary to 
condition on a larger cr-field D cr{l^} and replace Yar E[W* — W \ W] by the larger quantity 
Vari?[VF* — W I JF]. However, Var£'[l^* — W \ J^] may not give rise to a useful bound unless I is 
randomized. This is the case in the random graph example when Vari?[l^* — 1 1^] is replaced 
by \avE[W* -W\K]. 

This difficulty can be seen even in the case of independent, identically distributed zero-one 
random variables, where conditioning on X is the analog to conditioning on K. There, size biasing 
with a random I leads to YavE[W* - 1^ | X] = Var^[l - X/ | X] = VaTE[Xj \ X] = \ar{W/n) of 
order 1/n, but setting 1=1, the quantity Var£'[VF* — W \ X] = Var(Xi), a constant. 

The following lemma of Dembo and Rinott (1994) shows how to size bias a sum of the form 
W = jyj=i '^ji^j) by working with the argument distribution Ui, . . . , Un- 

Lemma 2.2 Let U = (C/i, . . . C/„) he a random n vector, and let tpi be nonnegative functions such 
that EtPiiUi) < oo, i = Let = . . . , ^i*^) satisfy P(Y« G dy) = P(U G 

dy)ipi{yi) / E%l)i[Ui) . Let I he a random variable taking values in {1, n\, distributed indepen- 
dently of all the above variables, with P{I = i) = E'ipi{Ui) / '^^^-^ Eipj(Uj). 

Let W = Yl^=i'^ji^j) have the distribution F. Then W* = Yl^=i'4'j0^j^^) has the distribution 
wdF{w)/X, where A = EW . 

Note that with Fj denoting the marginal distribution function of Uj, the distribution of Y*^*) is 
obtained by letting Y^^^ have the marginal distribution il)i{-)dFi{-) / Eipi{Ui) , and if l^*-*'' = n, letting 
iY^^\ . . . , Y^^^, • • • ) ^i*'') have the distribution of {Ui, . . . , {7i_i, C/j+i, • • • , Un) conditioned on 
Ui = u. 

To summarize, given W = X]j=i this suggests the following: 

Construction of 1^*. Choose a random index I as in the lemma. If / = i, let Yi ~ ipi{u)dFi[u) / EtpiiUi). 
If Yi is assigned the value u, let (l^i, . . . , Yi+i, . . . , y„) have the conditional distribution of 
([/i, . . . , Ui.uUi+i, ...,Un) given Ui = u. Now set W* = Z]=i V'j(^i)- 

If {Ui, . . .Un) are Gaussian or multinomial, then an explicit construction of such variables 
(Yi, . . . , y^-i, yj+i, . . . , Yn) having the required conditional distribution, jointly with {Ui, . . . Un) 
is possible. More details on applications of such constructions to sums of nonlinear functions of 
Gaussian and multinomial variables are given in Section [l] 

3 Proofs 

Before proving Theorems 11.21 and 11.41 we need the following lemma, the proof of which can be 
found in Barbour (1990), or Gotze (1991). 

Let Z be a standard p-variate normal vector and for u > 0, define 

(r„/i)(w) = S{/i(we"" + x/l -e-2«Z)}. 
Lemma 3.1 Let h : BP ^ R have three hounded derivatives. Then 

/•CO 

giw) = - / [Tuh{w) - <l>h]du 
Jo 
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solves 

trD'^g{w) - w • Vg{w) = h{w) - ^h, 
and for any k^^ partial derivative we have the bound 

f)k ^ 8^ 1 

Further, for any X G and positive definite p x p matrix T,, f defined by the change of variable 

/(w)=5(S-i/2(w-A)) (12) 

solves 

tri:D^f{w) - (w - A) • V/(w) = h{^-^/^{w - A)) - <^h, (13) 

and hence 

Proof: One can follow Barbour (1990) to show that 5 is a solution, and that under the assumptions 
above, by dominated convergence, 

D''g{w) = - e-''"S{Z)^'/i(we-" + ^1 - e-^^Z)}du. 
Jo 

The Lemma now follows by straightforward calculations. □ 

Proof of Theorem 11.21 Given h, let / be the solution of ((T^ given by (|12() . Writing out the 
expressions in (|T3|) we have 

E{h{i:-y\W - A)) - $M = E{j2 E ^^^'5^/ W - - A.)^/(W)}. (15) 

i=l j=l * 1=1 * 

Recall that W* can be characterized by ©: 

EWiG{W) = XiEG{W'), 

holding for all functions G . ^ R for which the expectations exist. Identity ^ is equivalent to 

E{Wi - Xi)G{W) = XiE[G{W') - G(W)]. (16) 

For the coordinate function G(w) = wj we obtain 

aij = CoviWi, Wj) = EWiWj - = EXiiW; - Wj); (17) 

when i = j this recovers the one dimensional relation given in Baldi, Rinott, Stein (1989), EX{W* — 
W) = o"^, where W* has the VF-size biased distribution. Equation (|15|) . and (fT6|) with G = gfy-/, 
yield 

E{/.(E-'/^(W -X,)-m = E± ± a^/(W) - ± A.[j^/(W) - ^/(W)]}. (18) 

i=l j = l * J i = l * * 

Taylor expansion of ^f;"/(W*) centered at W, with remainder in integral form, and simple calcu- 
lations show that H18|) equals 

-E{±±lX,(Wj-W,)-a,]^fm} (19) 

i=l j=l ■' 
P P P fl 03 
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In the first term, we condition on W, apply the Cauchy-Schwarz inequahty and use H17() . and then 
apply the bound H14|) with k = 2 to obtain the first term in ®. The second term in H19() gives the 
second term in @ by applying (|14() with A; = 3. □ 

Proof of Theorem 11.41 Our proof extends and simplifies the proof of Stein (1986) in the 
univariate case. 

With W = {Wi,...,Wp) where Wj = EaeA, let T^f^ = EpeA.nsg^P^ and = 

Let / be the solution of (|13)) given by (|12l for a test function h. Writing (|13|) while noting that 
A = 0, and subtracting and adding a term at the end of the expression, we obtain 



1=1 j=i * ^ 



i=l 



E I 



E 

1=1 



1=1 J=l ■' 

o-^Ai a^Ai 



(20) 



Taylor expansion of ^/(W(°)) centered at W and some rearrangement shows that the i sum- 
mand in the above expression equals 



(21) 



^{EEE^jV-t) 



53 



dwidwjdwk 



/(W + t(w(°) - W))(I^, - w\''^){Wk - wl^'^)dt]. 



Using ()14() . applying the Cauchy-Schwarz inequality to the first expectation in 1)21^. and elementary 
calculations on the remaining two terms yield the three terms of @ respectively. □ 



4 Examples 

4.1 Sums of nonlinear functions 

Various detailed applications of Theorem 11.11 in the setting of Q and Lemma 12.21 and related 
references, are given in Dembo and Rinott (1994). We highlight two problems: 

Theorem 4.1 Let U = (f/i, . . . , C/^n) have the multivariate normal distribution A^(0,S), where 

5 = {pij} satisfies pu = 1 for all i, and maxj^j \pij\ < r < 1/3. Let W = ^^"=1 V'(^j) where 
< il}{u) < i^e^'"l' for some K > and q <2, and ip scaled such that Nifj = 1 (hence EW = n). 
Denote cr^ = VarW. Suppose niaxj ^"^-^ \pij\ < B < 00. Define D = max{\\h\\, \\h'\\}. Then, for 
some C = C{r,K,q) < 00, 

\Eh{ )-^h\< ^-^^ — + ^ — . 22 

The construction of W* utilizes the well known structure of conditional distributions in the Gaus- 
sian case. 

An analogous Normal approximation holds for W = Y17=i '^i^i)^ when U is a vector of multino- 
mial variables with equal (or commensurate) cell probabilities and Y27=i ^« ~ some integer 
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k > 0. In view of Lemma 12.21 the construction of the couphng can be done as follows: thinking 
about (Ui, . . . , Un) as counting the distribution of kn balls in n cells, choose a cell at random, and if 
cell i is chosen reset the number of balls in it according to the distribution ■ip{u)P(Ui G u)/EiJj{Ui). 
If doing this requires the addition of balls into cell i, these balls are chosen with equal probability 
per ball from the other cells. If the resetting requires a reduction in the number of balls in cell i, 
then a suitable number of balls is redistributed at random in the remaining cells. Now W* is the 
sum of the function ijj applied to these adjusted cell counts. This defines W* on a joint space with 
W, allowing the calculation of the bound in Theorem 11.11 This construction generalizes to any 
situation where {Ui, . . . , C/„) have the same distribution as that of some n iid variables conditioned 
on their sum. 

4.2 Graph degree counts 

Let K = Kn^n be a random graph on the vertex set {1, . . . where each pair of vertices has 
probability vr of making up an edge, independently of all other such pairs. For distinct, fixed di, 
i = 1, . . . ,p, let Wi be the number of vertices of degree di. Set W = {Wi, . . . , Wp), EW = A = 
(Ai, . . . , Xp), VarW = S = (o"jj) . For explicit expressions of A and S, see below. Set 

= (^~^^ 7r'^« (1 - vr)"-i-* , and S = 

The theorem below can be extended to the case nvr^ c > as n ^ oo; for simplicity we 
assume < vr = c/(n — 1) < 1. 

Theorem 4.2 If tt = = c/{n — 1), then for any h : BP ^ R, having hounded mixed partial 
derivatives up to order 3, 

(24) 

where 

Xi = nP{i), aij = Cov{W„ Wj) = n(3{i)(3{j) 

5ij is 1 if i = j, otherwise, and M is a universal constant. Asymptotic joint normality obviously 
follows. 

For the case p = 1, Karohski and Rucihski (1987) proved asymptotic normality when nn^^^^^^^ 
oo and vrn — > 0, or vrn — > oo and vrn — log n — d log log n — > — oo. See also Palka (1984) and Bollobas 
(1985). Asymptotic normality when vrn — > c > 0, was obtained by Barbour, Karohski and Rucihski 
(1989). See also Kordecki (1990) for the case of the one dimensional distribution of the number 
of vertices of degree zero, for nonsmooth h. Numerous univariate results on asymptotic normality 
of counts on random graphs, including counts of the type discussed in Theorems 14.21 and 4.3, are 
given in Janson and Nowicki (1991) and references therein. 

We remark that the calculation of a bound on the conditional variance in Theorem ll.21 as well 
as other terms, is usually involved in nontrivial cases. The technical details omitted in the following 
sketch of the proof of Theorem 14.21 are available in Goldstein and Rinott (1994). 

Sketch of Proof: Let D(v) denote the degree of vertex v in K and set X^i = 1 ii D{v) = di 
and otherwise. We have Wi = Ylv=i ■^'"i^ i = I, . . . ,p. Note that D(v) ~ Binomial{n — l,7r) 
which approaches Poisson(c) as n — >■ oo. Note that j3{i) in equals EX^ = P{D{v) = di), and 
so the expression for Aj in (|25l) is immediate. Also, by conditioning on the existence of an edge 
between vertices v and u and then unconditioning, we can compute EX^Xuj, and a straightforward 



min,/3(i)(l-V? , m. 



(23) 



[dj - c){dj - c) 
c(l - c/(n - 1)) 



+ 6ijnP{i), 



(25) 
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calculation leads to the expression for cjjj in H25() . Using the fact that the maximum absolute 
value of is bounded by the largest eigenvalue of S^^/^, and invoking the Rayleigh-Ritz 

characterization of eigenvalues, we obtain with some calculation, < ^-1/2^1/2^ 

The construction required for the application of Theorem 11.21 is straightforward. Fix i and let 
V be uniformly distributed on the vertex set {l,...,n}, independent of K. (Note that here V 
is uniform because 7^ = EX^, v = 1, . . . ,n are all equal for a fixed i.) Now D{V) denotes the 
degree of the randomly chosen vertex V. If D{V) > di define to differ from K only in that 
D(y) — di edges selected uniformly from the D(V) edges at V are removed from the edge-set. If 
D(y) < di define to be the graph obtained from K by adding di — D(V) edges of the form 
(y,u), where the vertices u are selected uniformly from the n — 1 — D{V) vertices not connected 
to V. If D{V) = di, = K. Clearly, liV = v, then in the new graph the degree of v is di so 
that the indicator is size biased to 1, and the distribution of is the same as the conditional 
distribution of K given X^i = 1. 

Define W* to be related to as W is related to K, that is, set X^ - = 1 if in the graph K^, 



D{v) = dj, otherwise 0, and Wj = J2u=i-^uj^ 3 ~ 1)---P- From the discussion in Section |21 it 
follows that this procedure defines W* as in Definition 11.21 and Theorem ll.2l 

In order to obtain a tractable bound to the first term on the right hand side of © , we condition 
on a larger cr-field, as discussed in Section |21 Specifically, we use the relation Var£'[ Wj — Wj \ W] < 
Var E[W^ - Wj \ K], and show that 



YarE[W^ - Wj \ K] = 0((1 + c^){l + 



d? 



1 - c/{n - 1) 



)/n) 



Let £ denotes the edge set of K and | • | cardinality. Conditioning on V 
expectation, recalling P{V = v) = 1/n, we obtain 



(26) 



V and then taking 



E[W'j - Wj\K] 

v:D{v)>di 

+- E 



n 

1 

+- 

n 



{u : (n, v) G £, D{u) = dj + 1} 



{u : (n, v) £, D{u) = dj - 1} 



v:D{v)<di 

{v : D{v) + di} 



n 



{v : D{v) = dj] 



{u : (n, v) G £, D{u) 
{u : {u,v) ^ £,D{u) 

(1 - kj) ■ 



d,} 
d,} 



D{v) - dj 
D{v) 

dj - D{v) 
n - 1 - D{v) 

(27) 



To understand the first term, for example, note that that ifV = v and D{v) > di, then X^^- — X^j = 
1 if {u,v) G £, D{u) = dj + 1, and {u,v) is one of the di — D{v) edges removed at v at random, 
chosen with probability {D{v) — di) / D{v). 

The calculation of a bound on the variance of the expression in (|27() can be done by computing 
the covariances between the terms. They involve conditioning on events to induce independence of 
terms appearing as products in the covariances, the use of simple coupling arguments and various 
moment inequalities. 

The bound for the second term on the right hand side of © is obtained by noting that \Wj — 
Wj\ < \D(V) — di\ + 1 and applying simple calculations related to the Binomial distribution of 
D{V). □ 



4.3 Graph Vertex Color Matching 

Let G = Gn be a fixed regular graph on a vertex set V of size n, with each vertex u G V of degree d. 
The regularity of G implies the set £ of edges of G has size N = nd/2. Let C = {1, . . . ,p} be a set 
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of p colors, and suppose that each vertex t; G V is independently assigned color i with probability 
TTj. Let B = —. ^7- r 

Theorem 4.3 Let W = (W^i, . . . , Wp) where VFj, i = 1, . . . ,p zs the number of edges of G that have 
both vertices of color i. Then E'W = A = N{'k\, . . . ,7rp) and Var(l^) = S = (cjjj) as given in 
equations i29\) and and for any h : —> R, having bounded mixed partial derivatives up to 

order 3, 

\Eh{^-^/^{W - A)) - ^h\ < iV-i/2^|/S||D2/i||d3/2 + p^B'^/^\\D^h\\d^^ , (28) 

where M is a universal constant. Asymptotic joint normality obviously follows. 

Proof: First we will obtain S = VarW in order to bound Let X^i = 1 if edge e 

has color i on both vertices and otherwise, so Wi = X^eeS "'^e* counts the number of edges with 
both vertices of color i. We have EXei = irf, and YarXei = 7r?(l — Tr^?). Given an edge e, let Se 
denote the set of 2d — 1 edges that share a vertex with e, including e itself. For the 2(d — 1) edges 
/ G Se, / 7^ e, Cov{Xei, Xfi) = irf — nf. For / Se, this covariance is by independence. Thus, 

an = VariWi) = Nnfil - vrf) + 2N{d - l){nf - 7rf). (29) 

For different colors z 7^ j for / G Se, we have Cov{Xei, Xfj) = — 7r?7r|; again, for f ^ Se this 
covariance is 0. Hence, 

for i ^ j, Gij = Cov{Wi, Wj) = -N{2d - 1)71^11]. (30) 

Let A and H be the diagonal matrices with i*^ diagonal entry vr?, vr^ — irf respectively, and let 
6 be a column vector with i^^ component irf. Then S = N{2d— l)[A — 66*] +NH. In order to show 

3 /2 

that S y NH, let D be the diagonal matrix with diagonal entries it- , and g the column vector 

with entries vr^^ Then ^ - 66* = D{I - gg^)D. Since StTj = 1, it is easy to see that the smallest 
eigenvalue of / — gg^ is 0. Hence, A — bb^ is nonnegative definite and S >z NH is established. It 
follows that < N-^/'^B^/^. 

We now apply Theorem 11.41 to the mean zero variables Xei — vr? . When the square in the first 
term in the bound © is expanded and expectation is taken, most terms vanish by independence, 
and because \Se\ = 2d — 1, the number of summands which do not vanish under the root sign is of 
the order Nd"^. The second term in Q vanishes, and in the third term each expectation is of order 
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